Overview

Dataset Statistics

Number of Variables 2
Number of Rows 737
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 10
Duplicate Rows (%) 1.4%
Total Size in Memory 1.6 MB
Average Row Size in Memory 2.2 KB
Variable Types
  • Categorical: 2

Dataset Insights

Dataset has 10 (1.36%) duplicate rows Duplicates
text has a high cardinality: 727 distinct values High Cardinality

Variables


text

categorical

Approximate Distinct Count 727
Approximate Unique (%) 98.6%
Missing 0
Missing (%) 0.0%
Memory Size 1.6 MB

Length

Mean 1965.3962
Standard Deviation 1067.0111
Median 1712
Minimum 648
Maximum 9646

Sample

1st row Claxton hunting fi...
2nd row O'Sullivan could r...
3rd row Greene sets sights...
4th row IAAF launches figh...
5th row Dibaba breaks 5,00...

Letter

Count 1136265
Lowercase Letter 1080411
Space Separator 247435
Uppercase Letter 55854
Dash Punctuation 4945
Decimal Number 12588
  • text contains many words: 16882 words
  • The largest value (i) is over 2.0 times larger than the second largest value (said)

topic

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory Size 51.9 KB
  • The largest value (football) is over 1.8 times larger than the second largest value (rugby)

Length

Mean 7.0991
Standard Deviation 1.3542
Median 7
Minimum 5
Maximum 9

Sample

1st row athletics
2nd row athletics
3rd row athletics
4th row athletics
5th row athletics

Letter

Count 5232
Lowercase Letter 5232
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (football, rugby) take over 50.0%
  • The largest value (football) is over 1.8 times larger than the second largest value (rugby)

Interactions

Missing Values